Assured Reinforcement Learning with Formally Verified Abstract Policies

نویسندگان

  • George Mason
  • Radu Calinescu
  • Daniel Kudenko
  • Alec Banks
چکیده

We present a new reinforcement learning (RL) approach that enables an autonomous agent to solve decision making problems under constraints. Our assured reinforcement learning approach models the uncertain environment as a high-level, abstract Markov decision process (AMDP), and uses probabilistic model checking to establish AMDP policies that satisfy a set of constraints defined in probabilistic temporal logic. These formally verified abstract policies are then used to restrict the RL agent’s exploration of the solution space so as to avoid constraint violations. We validate our RL approach by using it to develop autonomous agents for a flag-collection navigation task and an assisted-living planning problem.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tree Based Hierarchical Reinforcement Learning

In this thesis we investigate methods for speeding up automatic control algorithms. Specifically, we provide new abstraction techniques for Reinforcement Learning and Semi-Markov Decision Processes (SMDPs). We introduce the use of policies as temporally abstract actions. This is different from previous definitions of temporally abstract actions as we do not have termination criteria. We provide...

متن کامل

Algorithms for Batch Hierarchical Reinforcement Learning

Hierarchical Reinforcement Learning (HRL) exploits temporal abstraction to solve large Markov Decision Processes (MDP) and provide transferable subtask policies. In this paper, we introduce an off-policy HRL algorithm: Hierarchical Q-value Iteration (HQI). We show that it is possible to effectively learn recursive optimal policies for any valid hierarchical decomposition of the original MDP, gi...

متن کامل

Hierarchical reinforcement learning with subpolicies specializing for learned subgoals

This paper describes a method for hierarchical reinforcement learning in which high-level policies automatically discover subgoals, and low-level policies learn to specialize for different subgoals. Subgoals are represented as desired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts...

متن کامل

Predictive Q-Routing: A Memory-based Reinforcement Learning Approach to Adaptive Traffic Control

In this paper, we propose a memory-based Q-Iearning algorithm called predictive Q-routing (PQ-routing) for adaptive traffic control. We attempt to address two problems encountered in Q-routing (Boyan & Littman, 1994), namely, the inability to fine-tune routing policies under low network load and the inability to learn new optimal policies under decreasing load conditions. Unlike other memory-ba...

متن کامل

Finding Structure in Reinforcement Learning

Reinforcement learning addresses the problem of learning to select actions in order to maximize one’s performance in unknown environments. To scale reinforcement learning to complex real-world tasks, such as typically studied in AI, one must ultimately be able to discover the structure in the world, in order to abstract away the myriad of details and to operate in more tractable problem spaces....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017